Search CORE

63 research outputs found

Hedonic price models and indices based on boosting applied to the Dutch housing market

Author: Kagie M.
Wezel M.C. van
Publication venue
Publication date
Field of study

We create a hedonic price model for house prices for six geographical submarkets in the Netherlands. Our model is based on a recent data mining technique called boosting. Boosting is an ensemble technique that combines multiple models, in our case decision trees, into a combined prediction. Boosting enables capturing of complex nonlinear relationships and interaction effects between input variables.We report mean relative errors and mean absolute error for all regions and compare our models with a standard linear regression approach. Our model improves prediction performance with up to 40% compared with Linear Regression. Next, we interpret the boosted models: we determine the most influential characteristics and graphically depict the relationship between the most important input variables and the house price. We find the size of the house to be the most important input for all but one region, and find some interesting nonlinear relationships between inputs and price.Finally, we construct hedonic price indices and compare these to the mean and median index and find that these indices differ notably in the urban regions of Amsterdam and Rotterdam.data mining;machine learning;gradient boosting;housing;hedonic price models;hedonic price index

Research Papers in Economics

Boosting the accuracy of hedonic pricing models

Author: Kagie M.
Potharst R.
Wezel M.C. van
Publication venue
Publication date
Field of study

Hedonic pricing models attempt to model a relationship between object attributes andthe object's price. Traditional hedonic pricing models are often parametric models that sufferfrom misspecification. In this paper we create these models by means of boosted CARTmodels. The method is explained in detail and applied to various datasets. Empirically,we find substantial reduction of errors on out-of-sample data for two out of three datasetscompared with a stepwise linear regression model. We interpret the boosted models by partialdependence plots and relative importance plots. This reveals some interesting nonlinearitiesand differences in attribute importance across the model types.pricing;marketing;data mining;conjoint analysis;ensemble learning;gradient boosting;hedonic pricing

Research Papers in Economics

Choosing Attribute Weights for Item Dissimilarity using Clikstream Data with an Application to a Product Catalog Map

Author: Groenen P.J.F.
Kagie M.
Wezel M.C. van
Publication venue
Publication date
Field of study

In content- and knowledge-based recommender systems often a measure of (dis)similarity between items is used. Frequently, this measure is based on the attributes of the items. However, which attributes are important for the users of the system remains an important question to answer. In this paper, we present an approach to determine attribute weights in a dissimilarity measure using clickstream data of an e-commerce website. Counted is how many times products are sold and based on this a Poisson regression model is estimated. Estimates of this model are then used to determine the attribute weights in the dissimilarity measure. We show an application of this approach on a product catalog of MP3 players provided by Compare Group, owner of the Dutch price comparison site http://www.vergelijk.nl, and show how the dissimilarity measure can be used to improve 2D product catalog visualizations.dissimilarity measure;attribute weights;clickstream data;comparison

Research Papers in Economics

Including Item Characteristics in the Probabilistic Latent Semantic Analysis Model for Collaborative Filtering

Author: Kagie M.
Loos M.J.H.M. van der
Wezel M.C. van
Publication venue
Publication date
Field of study

We propose a new hybrid recommender system that combines some advantages of collaborative and content-based recommender systems. While it uses ratings data of all users, as do collaborative recommender systems, it is also able to recommend new items and provide an explanation of its recommendations, as do content-based systems. Our approach is based on the idea that there are communities of users that find the same characteristics important to like or dislike a product. This model is an extension of the probabilistic latent semantic model for collaborative filtering with ideas based on clusterwise linear regression. On a movie data set, we show that the model is competitive to other recommenders and can be used to explain the recommendations to the users.algorithms;probabilistic latent semantic analysis;hybrid recommender systems;recommender systems

Research Papers in Economics

A graphical shopping interface bases on product attributes

Author: Groenen P.J.F. (Patrick)
Kagie M. (Martijn)
Wezel M.C. (Michiel) van
Publication venue
Publication date: 27/02/2007
Field of study

Most recommender systems present recommended products in lists to the user. By doing so, much information is lost about the mutual similarity between recommended products. We propose to represent the mutual similarities of the recommended products in a two dimensional space, where similar products are located close to each other and dissimilar products far apart. As a dissimilarity measure we use an adaptation of Gower's similarity coefficient based on the attributes of a product. Two recommender systems are developed that use this approach. The first, the graphical recommender system, uses a description given by the user in terms of product attributes of an ideal product. The second system, the graphical shopping interface, allows the user to navigate towards the product he wants. We show a prototype application of both systems to MP3-players

Erasmus University Digital Repository

Map Based Visualization of Product Catalogs

Author: Groenen P.J.F. (Patrick)
Kagie M. (Martijn)
Wezel M.C. (Michiel) van
Publication venue: Kagie, M. (Martijn)
Publication date: 01/01/2009
Field of study

Traditionally, recommender systems present recommendations in lists to the user. In content- and knowledge-based recommendation systems these list are often sorted on some notion of similarity with a query, ideal product specification, or sample product. However, a lot of information is lost in this way, since two even similar products can differ from the query on a completely different set of product characteristics. When using a two dimensional, that is, a map-based, representation of the recommendations, it is possible to retain this information. In the map we can then position recommendations that are similar to each other in the same area of the map. Both in science and industry an increasing number of two dimensional graphical interfaces have been introduced over the last years. However, some of them lack a sound scientific foundation, while other approaches are not applicable in a recommendation setting. In our chapter, we will describe a framework, which has a solid scientific foundation (using state-of-the-art statistical models) and is specifically designed to work with e-commerce product catalogs. Basis of the framework is the Product Catalog Map interface based on multidimensional scaling. Also, we show another type of interface based on nonlinear principal components analysis, which provides an easy way in constraining the space based on specific characteristic values. Then, we discuss some advanced issues. Firstly, we discuss how the product catalog interface can be adapted to better fit the users' notion of importance of attributes using click stream analysis. Secondly, we show an user interface that combines recommendation by proposing with the map based approach. Finally, we show how these methods can be applied to a real e-commerce product catalog of MP3-players

EUR Research Repository

Erasmus University Digital Repository

Advances in Online Shopping Interfaces: Product Catalog Maps and Recommender Systems

Author: Kagie M. (Martijn)
Publication venue: Over the past two decades the internet has rapidly become an important medium to retrieve information, maintain social contacts, and to do online shopping. The latter has some important advantages over traditional shopping. Products are often cheaper on the internet, internet companies sell a wider collection of products and consumers can buy items whenever they like without leaving their homes. On the other hand, the current state of online shops still has two major disadvantages over `real' shops: Products are often much harder to find than in traditional shops and there are no salesmen to advise the customers. In this thesis, we address both these disadvantages. We introduce and evaluate several new user interfaces for online shops that are based on representing products in maps instead of lists to user, such that products are easier to find. In these maps similar products are located close to each other. To create these maps, statistical techniques such as multidimensional scaling are used. Furthermore, we combine these maps with recommender systems to address the second disadvantage and to help the user in finding the product best suiting her needs. Also, we introduce a recommender system that is able to explain the recommendations it gives to users. We think that the methods discussed in this thesis can form a basis for new promising online shopping interfaces both in research as in practice.
Publication date: 19/05/2010
Field of study

Over the past two decades the internet has rapidly become an important medium to retrieve information, maintain social contacts, and to do online shopping. The latter has some important advantages over traditional shopping. Products are often cheaper on the internet, internet companies sell a wider collection of products and consumers can buy items whenever they like without leaving their homes. On the other hand, the current state of online shops still has two major disadvantages over `real' shops: Products are often much harder to find than in traditional shops and there are no salesmen to advise the customers. In this thesis, we address both these disadvantages. We introduce and evaluate several new user interfaces for online shops that are based on representing products in maps instead of lists to user, such that products are easier to find. In these maps similar products are located close to each other. To create these maps, statistical techniques such as multidimensional scaling are used. Furthermore, we combine these maps with recommender systems to address the second disadvantage and to help the user in finding the product best suiting her needs. Also, we introduce a recommender system that is able to explain the recommendations it gives to users. We think that the methods discussed in this thesis can form a basis for new promising online shopping interfaces both in research as in practice

EUR Research Repository

Erasmus University Digital Repository

An Empirical Comparison of Dissimilarity Measures for Recommender Systems

Author: Groenen P.J.F. (Patrick)
Kagie M. (Martijn)
Wezel M.C. (Michiel) van
Publication venue: Kagie, M. (Martijn)
Publication date: 01/01/2009
Field of study

Many content-based recommendation approaches are based on a dissimilarity measure based on the product attributes. In this paper, we evaluate four dissimilarity measures for product recommendation using an online survey. In this survey, we asked users to specify which products they considered to be relevant recommendations given a reference product. We used microwave ovens as product category. Based on these responses, we create a relative relevance matrix we use to evaluate the dissimilarity measures with. Also, we use this matrix to estimate weights to be used in the dissimilarity measures. In this way, we evaluate four dissimilarity measures: the Euclidean Distance, the Hamming Distance, the Heterogeneous Euclidean-Overlap Metric, and the Adapted Gower Coefficient. The evaluation shows that these weights improve recommendation performance. Furthermore, the experiments indicate that when recommending a single product, the Heterogeneous Euclidean-Overlap Metric should be used and when recommending more than one product the Adapted Gower Coefficient is the best alternative. Finally, we compare these dissimilarity measures with a collaborative method and show that this method performs worse than the dissimilarity based approaches

EUR Research Repository

Erasmus University Digital Repository

Determination of Attribute Weights for Recommender Systems Based on Product Popularity

Author: Groenen P.J.F. (Patrick)
Kagie M. (Martijn)
Wezel M.C. (Michiel) van
Publication venue: Kagie, M. (Martijn)
Publication date: 01/01/2009
Field of study

In content- and knowledge-based recommender systems often a measure of (dis)similarity between products is used. Frequently, this measure is based on the attributes of the products. However, which attributes are important for the users of the system remains an important question to answer. In this paper, we present two approaches to determine attribute weights in a dissimilarity measure based on product popularity. We count how many times products are sold and based on this, we create two models to determine attribute weights: a Poisson regression model and a novel boosting model minimizing Poisson deviance. We evaluate these two models in two ways, namely using a clickstream analysis on four different product catalogs and a user experiment. The clickstream analysis shows that for each product catalog the standard equal weights model is outperformed by at least one of the weighting models. The user experiment shows that users seem to have a different notion of product similarity in an experimental context

EUR Research Repository

Erasmus University Digital Repository

Boosting the accuracy of hedonic pricing models

Author: Kagie M. (Martijn)
Potharst R. (Rob)
Wezel M.C. (Michiel) van
Publication venue
Publication date: 01/01/2005
Field of study

Hedonic pricing models attempt to model a relationship between object attributes and the object's price. Traditional hedonic pricing models are often parametric models that suffer from misspecification. In this paper we create these models by means of boosted CART models. The method is explained in detail and applied to various datasets. Empirically, we find substantial reduction of errors on out-of-sample data for two out of three datasets compared with a stepwise linear regression model. We interpret the boosted models by partial dependence plots and relative importance plots. This reveals some interesting nonlinearities and differences in attribute importance across the model types

CiteSeerX

EUR Research Repository

Erasmus University Digital Repository